Implementing a Leading Loads Performance Predictor on Commodity Processors
نویسندگان
چکیده
Modern CPUs employ Dynamic Voltage and Frequency Scaling (DVFS) to boost performance, lower power, and improve energy efficiency. Good DVFS decisions require accurate performance predictions across frequencies. A new hardware structure for measuring leading load cycles was recently proposed and demonstrated promising performance prediction abilities in simulation. This paper proposes a method of leveraging existing hardware performance monitors to emulate a leading loads predictor. Our proposal, LL-MAB, uses existing miss status handling register occupancy information to estimate leading load cycles. We implement and validate LL-MAB on a collection of commercial AMD CPUs. Experiments demonstrate that it can accurately predict performance with an average error of 2.7% using an AMD OpteronTM4386 processor over a 2.2x change in frequency. LL-MAB requires no hardwareor application-specific training, and it is more accurate and requires fewer counters than similar approaches.
منابع مشابه
Implementing Data - Parallel Programs
Parallel computing is moving rapidly from an era of \Big Iron" to a future that will be dominated by systems built from commodity components. Users will be able to construct high-performance systems by clustering oo-the-shelf processors using widely available high-speed switches. A key question is how to organize the supporting software to best leverage these commodity hardware systems. We addr...
متن کاملParallel Programming Models and Paradigms
In the 1980s it was believed computer performance was best improved by creating faster and more e cient processors. This idea was challenged by parallel processing, which in essence means linking together two or more computers to jointly solve a computational problem. Since the early 1990s there has been an increasing trend to move away from expensive and specialized proprietary parallel superc...
متن کاملExploiting selective instruction reuse and value prediction in a superscalar architecture
In our previously published research we discovered some very difficult to predict branches, called unbiased branches. Since the overall performance of modern processors is seriously affected by misprediction recovery, especially these difficult branches represent a source of important performance penalties. Our statistics show that about 28% of branches are dependent on critical Load instructio...
متن کاملAlthough the performance of commodity computers has improved drastically with the introduction of multicore processors and GPU computing, the standard R distribution is still based on single-threaded model of computation, using only a small fraction of t
Although the performance of commodity computers has improved drastically with the introduction of multicore processors and GPU computing, the standard R distribution is still based on single-threaded model of computation, using only a small fraction of the computational power available now for most desktops and laptops. Modern statistical software packages rely on high performance implementatio...
متن کاملFast Packet Forwarding on Commodity Platforms
Rather than using special-purpose hardware routers, software routers enable routing on commodity platforms. However, even with faster processors and multi-core platforms, the performance of software routers on commodity platforms today does not scale with high speed. We identify the limitations of commodity platforms by comparing them to high-end routers. In high end routers, each line card has...
متن کامل